Thermodynamic Rule Determining the Biological DNA Information Capacity 
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A rigorous thermodynamic expression is derived for the total biological information capacity per 
unit length of a DNA molecule. The total information includes the usual four letter coding sequence 
information plus that excess information coding often erroneously referred to as "junk" . We conclude 
that the currently understood human DNA code is about a hundred megabyte program written on 
a molecule with about a ten gigabyte memory. By far, most of the programing code is not presently 
understood. 

PACS numbers: 82.39.Pj, 87.14.gk, 87.14.gn 



I. INTRODUCTION 

The information capacity TL in a human DNA molecule 
of length L ~ 3 meter arising from the conventional four 
letter sequence code has been estimated to be[l[ 

Hcodc ~ 10 9 bit. (1) 

This estimate is approximately correct for humans, apes 
and perhaps snails. While the authors might admit some 
similarities with the apes, they would object to being 
called similar to snails. In defense of claiming that hu- 
mans are higher on the evolutionary scale, one might 
include the so-called "junk" DNA residing in the chain. 
The estimates are 
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10 11 bit. 



(2) 



In terms of the total information capacity, humans do 
indeed appear to be on a higher evolutionary scale than 
(say) snails. 

There has been considerable recent interest in the na- 
ture of "junk" DNA sequences [2H5( and in particular the 
role that they play in the evolutionary process. Our pur- 
pose is to derive a thermodynamic expression for the in- 
formation which resides in DNA. In the thermodynamic 
limit, the information capacity per unit length 
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has a thermodynamic description which is both mathe- 
matically rigorous and yet is experimentally measurable. 
The rule may be stated in terms of the DNA chain ten- 
sion t as a function of temperature T and the chemical 
potentials (/ii,/i2, • • • , fJ-c) of the molecules which make 
up the DNA chain; i.e. 



Our central result concerns a precise expression for 77; 
Theorem: The information capacity per unit length of 
a DNA chain is given by the thermodynamic expression 
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wherein k B is Boltzmann's constant. 

The rigorous proof of the theorem will be given in 
Sec lIIII The only assumptions of the proof reside in the 
first and second laws of statistical thermodynamics. Oth- 
erwise, the theorem is completely model independent. 
The importance of a force determination of information 
capacity is that in the laboratory tweezer measurements, 
either optical @ or magnetic Q, uniquely determine the 
DNA chain tension r(T, nx, . . . , /_t c ). 

In Sec|n]the relationship between thermodynamic en- 
tropy S and information Ti is reviewed. The statistical 
thermodynamics of long chain molecules is explored in 
Sec lIIII and the proof of the central theorem is provided. 
An order of magnitude statement for 77 in the human 
genome is discussed in the concluding SeclIVI As one 
moves up the evolutionary scale, the total information 
capacity in the DNA molecule appears to increase. 



II. ENTROPY AND INFORMATION 

The connection between entropy and information in 
statistical thermodynamics is well understood Q. The 
number of microscopic states fi of a macroscopic system 
may be written as 

n = e s ' kB = 2 n , (6) 

wherein the thermodynamic entropy S is determined by 
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S = k B In il where k B w 1.38065 x 10" 



(7) 
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The information capacity measured in bitsQ is thereby 

In ft S 



ft = lg ft = 
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(8) 



where In = log e and lg = log 2 . A discussion of the ther- 
modynamic entropy of a DNA chain follows. 



Defining the entropy per unit length a and the molecular 
densities per unit length (Ti, 1^, . . . , r c ) by 

a = lim ^ and I\ = lim -J- (14) 
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together with Eq. (fT3| yields 
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III. STATISTICAL THERMODYNAMICS 

If £ denotes the energy of a molecular chain of length 
L and molecular composition numbers Af%, A/2, • • • , A/" c , 
then the first and second thermodynamic laws for quasi 
static processes read 



d£ = TdS + rdL + Y^ VjdAfj 



(9) 



The DNA chain quantities {£ , S, L, Mi, A4, • • • , A/" c ) are 
all extensive. 

Employing extensive scaling 

X£ = £(XS,XL,Xj\fi,Xj\f 2 ,...,XMc), (10) 
one finds the Euler equation 
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£ = s lTs + L dL + ^M 
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Eqs.© and (JTTJ) imply 



£ = TS + tL + H-Mj 



(12) 



Taking the differential of Eq. (fT2|) and comparing the re- 
sult to Eqs.© yields 



SdT + Ldr + ^2j\f 3 d^ 3 = 0. 

3=1 



(13) 



dr = —odT — Tjdnj 
3=1 



(15) 



The entropy per unit length is thereby 
dr 



dT 



k B T]ln2 (16) 



m,H2,...,Hc 



allowing for the verification of our central theorem. 
Eqs.©, (0), dHJ) and ([HI) yield the required proof of 
Eq.©. 

IV. CONCLUSION 

In order to apply the theorem Eq.©, one has to fix 
the molecular chemical potentials. These chemical po- 
tentials depend on the solution properties of the envi- 
ronment in which the DNA molecule resides. Changing 
these environmental parameters also changes the infor- 
mation capacity per unit length of the DNA molecule. 
We here stress that the thermodynamic rule includes the 
total information capacity of all the possible biophysical 
forms, e.g. four letter coding, "junk" DNA insertions 
as well as semiconducting electrons existing in ordered 
water shells coating the DNA chain. Typical values for 
the human genome are of order r\ ~ 10 byte/nanometer. 
This is completely consistent with Eq.@. The total in- 
formation in a DNA molecule gets larger as one moves up 
the evolutionary scale. We conclude that the currently 
understood DNA code is about a 100 megabyte program 
written on a molecule with about 10 gigabyte of mem- 
ory capacity. Most of the programing code is beyond our 
understanding. 
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